risk measure
On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents
Mortensen, Oliver, Talebi, Mohammad Sadegh
We study risk-sensitive reinforcement learning in finite discounted MDPs, where a generative model of the MDP is assumed to be available. We consider a family or risk measures called the optimized certainty equivalent (OCE), which includes important risk measures such as entropic risk, CVaR, and mean-variance. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive OCE. We provide an exact characterization of utility functions $u$ for which the corresponding OCE defines an objective that is PAC-learnable. We analyze a simple model-based approach and derive PAC sample complexity bounds. We establish that whenever $u$ does not have full domain $\text{dom}(u)\neq \mathbb{R}$, the corresponding problem is not PAC-learnable. Finally, we establish corresponding lower bounds for both value and policy learning, demonstrating tightness in the size $SA$ of state-action space, and for a more restricted class of utilities, we derive lower bounds that makes the dependence on the effective horizon $\frac{1}{1-ฮณ}$ explicit. Specifically, for $\text{CVaR}_ฯ$ we show that the correct dependence on $ฯ$ is $\frac{1}{ฯ^2}$, thus improving by a factor of $\frac{1}ฯ$ over state-of-the-art although our bound has a suboptimal dependence on $\frac{1}{1-ฮณ}$.
Pessimistic Risk-Aware Policy Learning in Contextual Bandits
Wan, Yilong, Li, Yuqiang, Wu, Xianyi
We study risk-aware offline policy learning, aiming to learn a decision rule from logged data that is optimal under general risk criteria. This problem is crucial in high-stakes domains where online interaction is infeasible and adverse outcomes must be carefully controlled. However, existing literature on offline contextual bandits either centers on expected-reward criteria or restricts risk considerations to policy evaluation instead of optimization. In this work, we propose a unified distributional framework for optimizing Lipschitz-continuous risk functionals, a broad class of risk measures encompassing mean-variance, entropic risk, and conditional value-at-risk, among others. By developing novel empirical concentration inequalities for importance sampling-based distributional estimators, our analysis derives data-dependent suboptimality bounds with an $\tilde{\mathcal{O}}(1/\sqrt{n})$ rate, without relying on restrictive uniform overlap assumptions. This rate is minimax optimal and matches that of risk-neutral offline policy optimization, indicating that optimizing general Lipschitz risk criteria incurs no additional statistical cost relative to the expected-reward.
Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
Many real-world domains require safe decision making in uncertain environments. In this work, we introduce a deep reinforcement learning framework for approaching this important problem. We consider a distribution over transition models, and apply a risk-averse perspective towards model uncertainty through the use of coherent distortion risk measures. We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems. Unlike existing approaches to robustness in deep reinforcement learning, however, our formulation does not involve minimax optimization. This leads to an efficient, model-free implementation of our approach that only requires standard data collection from a single training environment. In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.